Method and device for obtaining video clip, server, and storage medium

ABSTRACT

The present application belongs to the technical field of audio and video, and relates to a method and device for obtaining a video clip, a server, and a storage medium. The method includes in response to obtaining a clip in live stream video data of a performance live stream room, using audio data from the live stream video data and audio data of an original performer to determine a target timepoint parameter of the live stream video data. The method includes obtaining a target video clip according to a start timepoint and an end timepoint in the target timepoint parameter. The present application is used to capture a more complete video clip.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is a continuation of U.S. application Ser. No.17/257,447, filed Dec. 31, 2020, which claims priority to PCTApplication No. PCT/CN2019/113321, filed on Oct. 25, 2019, which claimsthe priority to Chinese Patent Application No. 201811334212.8, filedwith China National Intellectual Property Administration on Nov. 9,2018, and entitled “METHOD AND DEVICE FOR OBTAINING VIDEO CLIP, SERVER,AND STORAGE MEDIUM”, the disclosures of which are incorporated herein byreference in their entireties.

TECHNICAL FIELD

The present application relates to the field of audio and videotechnologies, and in particular to a method and device for obtaining avideo clip, a server, and a storage medium.

BACKGROUND

With development of computer technologies and network technologies,there are more and more live streaming applications. A person can log into the live streaming application to watch a live streaming program of ahost in a live streaming room of interest. While the person watches thelive streaming program, he/she can record video clips with wonderfulcontent as finding the wonderful content, and then store the recordedvideo clips in a terminal used by the person or share the recorded videoclips with other friends.

A recording button is provided in a live streaming interface. Afterdetecting an operation instruction indicating that the recording buttonis operated, the terminal can use a screen recording function providedby an operation system of the terminal to start recording video datadisplayed on a screen. After detecting the operation instructionindicating that the recording button is operated again, the terminalends the recording of the video data displayed on the screen. In thisway, the video clips with wonderful content can be obtained byrecording.

In process of implementing the present application, the inventor foundthat the related art has at least the following problems.

People start to operate the recording button after they saw thewonderful content, and the terminal starts to record the video datadisplayed on the screen after detecting the operation instructionindicating that the recording button is operated. As a result, there isa time interval between a time when people saw the wonderful content anda time when the terminal starts to record the video data displayed onthe screen, and the wonderful content during this time interval cannotbe recorded, which causes the video clips of the wonderful contentincomplete.

SUMMARY

Implementations of the present application provide a method and devicefor obtaining a video clip, a server and a storage medium.

According to a first aspect of the implementations of the presentapplication, there is provided a method for obtaining a video clip. Themethod includes obtaining live streaming video data in a performancelive streaming room. The method includes determining target time pointpairs of the live streaming video data based on audio data of the livestreaming video data and audio data of an original performer. Each ofthe target time point pairs includes a start time point and an end timepoint. The method includes obtaining a target video clip from the livestreaming video data based on the target time point pairs.

According to a second aspect of the implementations of the presentapplication, there is provided a device for obtaining a video clip. Thedevice includes an obtaining unit configured to obtain live streamingvideo data in a performance live streaming room. The device includes adetermining unit configured to determine target time point pairs of thelive streaming video data based on audio data of the live streamingvideo data and audio data of an original performer. Each of the targettime point pairs includes a start time point and an end time point. Theobtaining unit is further configured to obtain a target video clip fromthe live streaming video data based on the target time point pairs.

According to a third aspect of the implementations of the presentapplication, there is provided a server. The server includes a processorand a memory for storing instructions executable by the processor. Theprocessor is configured to perform a method for obtaining a video clip.The method includes obtaining live streaming video data in a performancelive streaming room. The method includes determining target time pointpairs of the live streaming video data based on audio data of the livestreaming video data and audio data of an original performer. Each ofthe target time point pairs includes a start time point and an end timepoint. The method includes obtaining a target video clip from the livestreaming video data based on the target time point pairs.

According to a fourth aspect of the implementations of the presentapplication, there is provided a non-transitory computer-readablestorage medium having stored therein instructions which, when beingexecuted by a processor of a server, cause the server to perform amethod for obtaining a video clip. The method includes obtaining livestreaming video data in a performance live streaming room. The methodincludes determining target time point pairs of the live streaming videodata based on audio data of the live streaming video data and audio dataof an original performer. Each of the target time point pair includes astart time point and an end time point. The method includes obtaining atarget video clip from the live streaming video data based on the targettime point pairs.

According to a fifth aspect of the implementations of the presentapplication, there is provided an application program, including one ormore instructions which can be executed by a processor of a server tocarry out a method for obtaining a video clip. The method includesobtaining live streaming video data in a performance live streamingroom. The method includes determining target time point pairs of thelive streaming video data based on audio data of the live streamingvideo data and audio data of an original performer. Each of the targettime point pairs includes a start time point and an end time point. Themethod includes obtaining a target video clip from the live streamingvideo data based on the target time point pairs.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to explain technical solutions in implementations of thepresent application more clearly, drawings to be used in theimplementations are briefly introduced below. It is apparent that thedrawings in the following description show only some of theimplementations of the present disclosure, and other drawings may beobtained by those skilled in the art without departing from the drawingsdescribed herein.

FIG. 1 is a flowchart showing a method for obtaining a video clipaccording to an example implementation;

FIG. 2 is a schematic diagram showing display of link information of avideo clip according to an example implementation;

FIG. 3 is a schematic diagram showing a first time period according toan example implementation;

FIG. 4 is a structural block diagram showing a device for obtaining avideo clip according to an example implementation;

FIG. 5 is a structural block diagram showing a device for obtaining avideo clip according to an example implementation;

FIG. 6 is a structural block diagram showing a server according to anexample implementation; and

FIG. 7 is a structural block diagram showing another server according toan example implementation.

DETAILED DESCRIPTION

In order to make the purposes, technical solutions, and advantages ofthe present application more clear, the present application is describedbelow in detail with reference to the drawings and by way ofimplementations. Obviously, the implementations described here are onlya part of the implementations of the present application, rather thanall the implementations. Based on the implementations in the presentapplication, all other implementations which can be obtained by those ofordinary skill in the art without creative work fall within theprotection scope of the present application.

The implementations of the present application provide a method forobtaining a video clip, which can be performed by a server. The servermay be a background server of a live steam application, or a ContentDelivery Network (CDN) server. The server can be provided with aprocessor, a memory, a transceiver, etc. The processor can be configuredto perform process such as obtaining and distributing the video clip,and the memory can be configured to store data required or generated inthe process of obtaining the video clip, such as video data of the videoclip, live streaming video data, and so on. The transceiver can beconfigured to receive and transmit data which can be the live streamingvideo data, comment information, link information of the video clip,etc.

Before explaining a solution for obtaining a video clip provided by theimplementations of the present application, application scenarios of theimplementations of the present application are introduced first.

For the convenience of description, a terminal used by a host isreferred to as a user terminal, and the background server of the livestreaming application is referred to as a server. The above-mentionedlive streaming application is installed in the user terminal.

After the host controls to start live streaming of a performance livestreaming room by the live streaming application installed in the userterminal, the user terminal obtains the live streaming video data of thehost and sends the live streaming video data to the server. In responseto receiving the live streaming video data sent by the user terminal,the server can obtain a target video clip from the received livestreaming video data.

Alternatively, after the host controls to start the live streaming ofthe performance live streaming room by the live streaming applicationinstalled in the user terminal, the user terminal obtains the livestreaming video data of the host and sends the live streaming video datato the server. In response to receiving the live streaming video data,the server stores the received live streaming video data, and can obtainthe target video clip from the stored live streaming video data afterthe live streaming in the performance live streaming room ends.

The implementations of the present application provide a method forobtaining a video clip. As shown in FIG. 1, an execution flow of themethod can include the following operations.

In 101, obtaining live streaming video data in a performance livestreaming room.

The sever can obtain the live streaming video data in the performancelive streaming room. The performance live streaming room refers to alive streaming room where music performance is performed when the livestreaming is started. For example, the performance live streaming roomis a live streaming room for singing songs, a live streaming room forplaying musical instruments, or the like.

In an implementation of the present application, after the host controlsto start the live streaming of the performance live streaming room bythe live streaming application installed in the user terminal, theserver can receive the live streaming video data sent by the userterminal, and save the received live streaming video data. In addition,the server may also determine other accounts than a host account amongaccounts logged into the performance live streaming room, and then sendthe received live streaming video data to terminals used by the otheraccounts. For ease of presentation, the terminals used by other accountsare referred to as login terminals. In addition, each of the loginterminals is also installed with the live streaming application. Afterreceiving the live streaming video data, each of the login terminals canplay the received live streaming video data on a live streaminginterface of the performance live streaming room by logging into thelive streaming application installed in the login terminal.

In an implementation of the present application, the live streamingvideo data includes audio data having sound information and video datahaving picture information. The live streaming video data can also beunderstood as multi-media data.

In 102, determining a target time point pair (or otherwise time pointparameter) of the live streaming video data based on audio data of thelive streaming video data and audio data of an original performer.

The server can determine the target time point pair of the livestreaming video data based on the audio data of the live streaming videodata and the audio data of the original performer. Since the livestreaming video data is associated with the live streaming time, thisoperation can be understood as determining the target time point pair ofthe live streaming video data.

The audio data of the original performer may be the audio data of a songsung by the original singer, or the audio data of performance by theoriginal performer using a musical instrument. The target time pointpair includes one or more time point pairs each of which includes a setof time points, that is, a start time point and an end time point.

Specifically, the start time point and the end time point included inthe target time point pair may be a time point indicating a start of thewonderful content of the live streaming video and a time pointindicating an end of the wonderful content of the live streaming video.

In an implementation of the present application, if the video data andaudio data in the live streaming video data are separated, afterobtaining the live streaming video data, the server can directly obtainthe audio data of the original performer based on the audio data in thelive streaming video data. If the video data and the audio data in thelive streaming video data are mixed, after obtaining the live streamingvideo data, the server can separate the video data and audio data in thelive streaming video data to obtain the audio data in the live streamingvideo data, and then obtain the audio data of the original performerbased on the audio data in the live streaming video data.

In an implementation of the present application, after obtaining thelive streaming video data in the performance live streaming room, theserver can obtain streaming introduction information of the livestreaming in the live streaming room, which includes content of the livestreaming by the host. The server can obtain the audio data of theoriginal performer based on the content of the live streaming by thehost.

Specifically, the server may perform similarity matching between theaudio data in the live streaming video data and the audio data of theoriginal performer, and determine the target time point pair in the livestreaming video data based on a similarity matching result.

In an implementation of the present application, a time point may bedetermined first, and then the target time point pair is determinedbased on this time point. Accordingly, a processing of the 102 caninclude:

determining a first time point in the live streaming video data based onthe audio data from the live streaming video data and the audio data ofthe original performer, and determining the target time point paircorresponding to the first time point centered at the first time pointbased on a preset interception time duration.

The preset interception time duration can be set in advance and storedin the server, and can be, for example, 10 seconds.

In an implementation of the present application, the server candetermine the first time point by using the audio data in the livestreaming video data and the audio data of the original performer,obtain the preset interception time duration stored in advance, andobtain the start time point corresponding to the first time point bydetermining a time point preceding the first time point by a half of thepreset interception time duration, and the end time point correspondingto the first time point by determining a time point after the first timepoint by a half of the preset interception duration. As such, the starttime point and the end time point can form the target time point paircorresponding to the first time point.

For example, it is assumed that the first time point is at the 20thsecond and the preset interception time duration is 10 seconds, a halfof the preset interception duration is 5 seconds, and the start timepoint corresponding to the first time point is at the (20-5)thsecond=the 15th second, and the end time point corresponding to thefirst time point is at the (20+5)th second=the 25th second, and thus thetime points at the 15th second and the 25th second form the target timepoint pair corresponding to the first time point.

Specifically, the first time point can be the time point thatcharacterizes the wonderful content of the live streaming, and the starttime point and the end time point in the target time point paircorresponding to the first time point can be the time point thatcharacterizes the start of the wonderful content and the time point thatcharacterizes the end of the wonderful content.

In an implementation of the present application, in response todetermining that the audio data in the live streaming video data is theaudio data of a song sung by the host, and the audio data of theoriginal performer is the audio data of the song sung by the originalsinger, the method for determining the first time point can include:

performing voice recognition on the audio data in the live streamingvideo data to obtain lyrics of the song; obtaining the audio data of thesong sung by the original singer based on the obtained lyrics;determining a similarity between audio features of the audio data of thesong sung by the original singer and audio features of the audio data inthe live streaming video data for each sentence of the lyrics to obtaina lyric similarity of each sentence of the lyrics; and determining atime point in the audio data of live streaming video data correspondingto a position in the lyrics with a highest lyric similarity among thelyrics whose similarities are above a first preset threshold, as thefirst time point in the live streaming video data.

The first preset threshold can be set in advance and stored in theserver, and can be, for example, 90%.

In an implementation of the present application, the server may use avoice recognition algorithm stored in advance to perform the voicerecognition on the audio data in the live streaming video data to obtainthe lyrics of the song sung by the host. A query can be performed byusing the obtained lyrics in a preset lyric database which includes thelyrics and the audio data of the original singer of the songcorresponding to the lyrics to determine the audio data of the originalsinger of the song corresponding to the obtained lyrics. For eachsentence of the lyrics, the server can determine the audio data of thesong sung by the original singer and the audio data of the song sung bythe host. Based on an audio feature extraction algorithm, the serverperforms the audio feature extraction on the audio data of the song sungby the original singer and the audio data of the song sung by the hostrespectively to determine the similarity between the audio features ofthe song sung by the original singer and the audio features of the songsung by the host with respect to the sentence of lyrics. Then the serverdetermines a relationship between the similarity and the first presetthreshold in magnitude. If the similarity is greater than the firstpreset threshold, the server determines a position with the highestsimilarity in the sentence of lyrics and the live streaming time pointof the audio data in the live streaming video data corresponding to theposition with the highest similarity, and determines this time point asthe first time point of the live streaming video data. If the similarityis less than or equal to the first preset threshold, the process ofdetermining the first time point is not performed. In this way, theabove processing can be performed for each sentence of lyrics todetermine the first time point of the live streaming video data.

In this way, for a sentence of lyrics, if the similarity between theaudio features of the audio data in the live streaming video data andthe audio features of the audio data of the original singer is higherthan the first preset threshold, the position in the lyrics with thehighest similarity is further selected, which indicates that the lyricsin this position are sung better by the host. The audio data in the livestreaming video data corresponding to the position of the lyrics isdetermined, and the time point for playing the determined audio data isdetermined as the first time point, which indicates that the similaritybetween the audio data of the live streaming at the first time point bythe host and the audio data of the original singer is highest, and alsoindicates that the host sings better at the first time point which canbe determined as a wonderful moment.

Specifically, the voice recognition algorithm can include any voicerecognition algorithm, such as a Fast Endpoint Detection (FED) algorithmor the like.

In addition, in response to determining that the audio data in the livestreaming video data is the audio data of playing a musical instrument,the server can identify the audio data in the live streaming video datato determine a name of a work played by the host, and then search forthe audio data of an original performer playing the musical instrumentbased on the name of the work. The server performs the alignment processon the audio data in the live streaming video data and the audio data ofthe original performer playing the musical instrument, and performssegmentation process on the two pieces of audio data after the alignmentprocess. For example, each of the two pieces of audio data are segmentedinto pieces of audio data of 5 seconds, which are sequentially numberedas a1, a2, a3, . . . , ai, . . . , an for the audio data in the livestreaming video data, and sequentially numbered as b1, b2, b3, . . . ,bi, . . . , bn for the audio data of the original performer playing themusical instrument. Then the server can extract the audio features of a1and the audio features of b1, respectively, and calculate the similaritybetween the extracted audio features of a1 and b1. If the similarity isgreater than the first preset threshold, the position in a1 with thehighest similarity to b1 is determined, and the live streaming timepoint corresponding to the position with the highest similarity isobtained and determined as the first time point. By analogy, the firsttime points for the audio data such as a2 and a3 subsequent to a1 can bedetermined.

In addition, in response to determining that the audio data in the abovelive streaming video data is the audio data of the sung song, the firsttime point can also be obtained in the manner of the segmentationprocessing.

The audio features may be fundamental audio features, pitch audiofeatures, and so on. The audio feature extraction algorithm may be analgorithm in the prior art, for example, an algorithm for extracting thefundamental audio features in the existing music scoring system. Aspecific process for extracting the audio features includes:pre-emphasis, framing, windowing, obtaining short-term average energy,and deriving autocorrelation. The fundamental audio features can beobtained through such process, and primary parameters involved in thisprocess include a high-frequency boosting parameter, a frame length, aframe shift, and unvoiced and voiced thresholds.

In 103, obtaining a target video clip from the live streaming video databased on the target time point pair.

The server can obtain the target video clip from the live streamingvideo data based on the target time point pair. The target video cliprefers to a video clip in the live streaming video data that includesfirst audio data, and the first audio data is audio data in the audiodata of the live streaming video data with a similarity with the audiodata of an original performer meeting a certain condition.

Specifically, the target video clip can be a video clip between thestart time point and the end time point included in the target timepoint pair in the live streaming video data.

In an implementation of the present application, after determining thetarget time point pair, the server can find a time stamp correspondingto the start time point of the target time point pair and a time stampof the end time point in the target time point pair based on time stampsof the live streaming video data, and can intercept the video clipbetween these two time stamps as the target video clip.

In an implementation of the present application, after the target videoclip is obtained, the target video clip may also be provided to theaudiences in the performance live streaming room. The correspondingprocessing can include:

generating link information of the target video clip, and sending thelink information to login terminals of other accounts than the hostaccount in the performance live streaming room to enable the loginterminals of the other accounts to display the link information on aplayback interface of the performance live streaming room, or to displaythe link information on a live streaming end interface of theperformance live streaming room.

Since the host account logs in to the live streaming room during thelive streaming, and the accounts of the audiences watching the livestreaming will also log in to the live streaming room, after the linkinformation of the target video clip is generated, the link informationis sent to the login terminals of other accounts than the host accountin the performance live streaming room. Since the login terminals of theother accounts are all installed with the live streaming application,the login terminals of the other accounts can display the linkinformation on the playback interface of the performance live streamingroom by the installed live streaming application, or display the linkinformation on the live streaming end interface of the performance livestreaming room.

The playback interface is an interface for displaying a playing link forthe playback of the live streaming video data, and the live streamingend interface refers to an interface displayed when the live streamingends in the live streaming room.

In an implementation of the present application, after obtaining thetarget video clip, the server can randomly obtain a picture from thedata of the target video clip as a cover of the target video clip, andadd a name to the target video clip, for example, a name of the songsung by the host can be used as the name of the target video clip, andthen generate the link information based on the cover, the name, and thedata storage address of the target video clip. The link information canbe a Uniform Resource Locator (URL).

The server can determine the accounts other than the host account amongthe accounts logged into the performance live streaming room, and sendthe link information of the target video clip to the login terminals ofthe other accounts. After receiving the link information, the loginterminals of the other accounts can display the link information of thetarget video clip on the playback interface of the performance livestreaming room by the installed live streaming application, or candisplay the link information of the target video clip on the livestreaming end interface. For example, as shown in FIG. 2, the serverobtains the link information of two video clips, one is the linkinformation of “Miaoian” and the other is the link information of “MeowMeow Meow”. The login terminals of the other accounts can display thelink information of two video clips on the live streaming end interface.Specifically, the link information shown in FIG. 2 includes two videoplayback links.

If the audience in the performance live streaming room wants to sharecertain link information, he/she can select the link information andthen click a corresponding sharing option. The terminal used by theaudience will display various regional options for sharing, such asregional options for sharing in a certain application or the currentlive streaming application, after detecting a click instruction on thesharing option. The audience can select the corresponding regionaloption and then confirm the option by a click operation. The terminalused by the audience will display an edit box in response to detectingthe click operation for determining the option. In this case, the editbox will display preset content, such as “come and watch a song B sungby a host A”, etc. The audience can directly share the content displayedin the edit box as it is, or re-edit the content displayed in the editbox, and then share it to the region corresponding to the selectedregional option. As such, the sharing process is completed.

In an implementation of the present application, a process of screeningthe first time points is also provided, and the corresponding processingcan include:

determining a second time point in the live streaming video data basedon interaction information of accounts other than a host account of thelive streaming video data; if a target time point in the first timepoints belongs to the second time point, retaining the target timepoint, and if the target time point in the first time points does notbelong to the second time point, deleting the target time point; anddetermining the target time point pair corresponding to the retainedfirst time point centered at the retained first time point based on thepreset interception time duration.

The interaction information may include one or more of commentinformation, like information, and gift information.

The target time point may be any time point in the first time points.That is, each time point in the first time points is used as the targettime point, and it is determined whether the target time point belongsto the second time points, so as to determine whether to retain thetarget time point or delete the target time point.

In an implementation of the present application, after the livestreaming in the performance live streaming room starts, the server canstore the received comment information, like information and giftinformation, and determine the second time point in the live streamingvideo data by using one or more of the comment information, likeinformation and gift information.

It is determined whether the target time point in the first time pointsbelongs to the second time point, if the target time point belongs tothe second time point, the target time point is retained, and if thetarget time point does not belong to the second time point, the targettime point is deleted.

Then the server can take the retained first time point as the center,obtain the start time point corresponding to the retained first timepoint by determining a time point preceding the retained first timepoint by half of the preset interception time duration, and obtains theend time point corresponding to the retained first type time point bydetermining a time point after the retained first time point by half ofthe preset interception time duration. The start time point and the endtime point form the target time point pair. In this way, the first timepoints can be screened based on the interaction information, so that theintercepted video clips have a higher probability of including thewonderful content.

In view of the foregoing description, the second time point can beunderstood as time points characterized by frequent audience interactionduring the live streaming.

In an implementation of the present application, there is also provideda method for determining the target time point pair by using theinteraction information, and the corresponding processing can include:

determining the second time point in the live streaming video data basedon interaction information of accounts other than a host account of thelive streaming video data; combining the first time point and the secondtime point, and performing deduplication process on the combined timepoints; and determining the target time point pair corresponding to thetime point obtained by the deduplication processing based on the presetinterception time duration by taking the time point obtained by thededuplication processing as the center.

In an implementation of the present application, after the livestreaming in the performance live streaming room starts, the server canstore the received comment information, like information and giftinformation, and can determine the second time point in the livestreaming video data by using one or more of the comment information,like information and gift information.

Then the first time points and the second time points are combined toobtain the combined time points, and the reduplicated time point in thecombined time points is deleted, that is, the deduplication processingis performed on the time points, and the start time point correspondingto the deduplicated time point is obtained by determining a time pointpreceding the deduplicated time point by half of the preset interceptiontime duration, and the end time point corresponding to the deduplicatedtime point is obtained by determining a time point after thededuplicated time point by half of the preset interception timeduration. The start time point and the end time point form the targettime point pair.

In an implementation of the present application, there are many mannersto determine the second time point based on the interaction informationin the live streaming video data, and several feasible ones among themare given below.

In a first manner, if an amount of gift resources of the live streamingvideo data in a first time period exceeds a second preset threshold, amiddle time point or an end time point of the first time period isdetermined as the second time point in the live streaming video data.

The duration of the first time period can also be preset and stored inthe server, and can be, for example, 2 seconds. The second presetthreshold can also be preset and stored in the server.

In an implementation of the present application, the server maydetermine the first time periods in the live streaming video data basedon the time stamps of the live streaming video data. The first timeperiods may be time periods of same duration, and the time intervalbetween adjacent time periods may be equal. The time interval betweenthe adjacent time periods can be determined by the start time points orthe end time points of the adjacent time periods. Furthermore, there mayor may not be an overlapped area between adjacent first time periods.

For example, as shown in FIG. 3, the live streaming video data is videodata with a length of 30 minutes, a period from the 0th second to 2ndsecond is a first one of the first time periods, t1, a period from the1st second to 3rd second is a second one of the first time periods, t2,a period from the 2nd second to 4th second is a third one of the firsttime periods, t3, and so on, and multiple first time periods areselected. The start time point and the end time point of each first timeperiod are determined, and based on the start time point and end timepoint, the names and number of gifts carried in the received giftrequests are determined during the time interval between the start timepoint and the end time point to count the number of the gifts in thetime interval, and then the server can obtain resources for each type ofthe carried gifts, for example, 50 gold coins for a “yacht” gift. Theamount of resources for each type of gifts is obtained by multiplyingthe number of each type of gifts by the corresponding resources, andthen is added up to obtain the amount of the gift resources in the firsttime period. Then the server can determine the magnitude relationshipbetween the amount of the gift resources in the first time period and asecond preset threshold. If the amount of the gift resources in thefirst time period is greater than the second preset threshold, a middletime point of the first time period can be determined and used as thesecond time point in the live streaming video data, or an end time pointof the first time period can be determined and used as the second timepoint in the live streaming video data.

In addition, the amount of the gift resources can also be determinedbased on an image recognition method, and the corresponding processingcan include:

performing gift image recognition on images for the first time period inthe live streaming video data to obtain the number of the images ofvarious recognized gifts. Based on the number of images of each type ofgifts, the amount of the gift resources in the first time period isdetermined.

In an implementation of the present application, the server may obtainthe images for each first time period from the live streaming videodata, and then input the images into a preset gift image recognitionalgorithm which may be a pre-trained algorithm, so as to identify thenumber of images of each type of gifts contained in the images, and toobtain the resources for each type of gifts. The number of each type ofgifts is multiplied by the corresponding resources of each type of giftsto obtain the amount of resources of each type of gifts. The amount ofthe resources of each type of gifts is added up to obtain the amount ofthe gift resources in the first time period.

Since the more gift resources reflect that the content of the livestreaming is more wonderful, the amount of the gift resources can beused for determining the wonderful content.

The gift image may refer to an area in the image that represents a gift.

Specifically, the gift image recognition algorithm may be a neuralnetwork algorithm obtained by training. In response to determining thatan image is input to the neural network algorithm, the neural networkalgorithm can output the name of the gift image contained in the image,that is, the name of the gift, and the number of gift images.

In a second manner, if the amount of comment information of the livestreaming video data in a second time period exceeds a third presetthreshold, the middle time point or the end time point of the secondtime period is determined as the second time point in the live streamingvideo data.

The duration of the second time period can also be preset and stored inthe server, and can be, for example, 2 seconds. The third presetthreshold can also be preset and stored in the server.

In an implementation of the present application, the server maydetermine the second time periods in the live streaming video data basedon the time stamps of the live streaming video data. The second timeperiods may be time periods of same duration, and the time intervalbetween adjacent time periods may be equal. Furthermore, there may ormay not be an overlapped area between adjacent second time periods.

For example, the live streaming video data is video data with a lengthof 30 minutes, a period from the 0th second to 2nd second is a first oneof the second time periods, t1, a period from the 1st second to 3rdsecond is a second one of the second time periods, t2, a period from the2nd second to 4th second is a third one of the second time period, t3,and so on, and thus multiple second time periods are selected. The starttime point and the end time point of each second time period aredetermined, and based on the start time point and end time point, theamount of comment information received in the time interval between thestart time point and the end time point are determined, the magnituderelationship between the amount of received comment information and thethird preset threshold is determined. If the amount of received commentinformation is greater than the third preset threshold, the middle timepoint of the second time period can be determined as the second timepoint in the live streaming video data, or the end time point of thefirst time period can be determined as the second time point in the livestreaming video data.

Since the more comment information received reflects that the content ofthe live streaming is more wonderful, the amount of comment informationcan be used for determining the wonderful content.

In a third manner, if the number of likes of the live streaming videodata in a third time period exceeds a fourth preset threshold, themiddle time point or the end time point of the third time period isdetermined as the second time point in the live streaming video data

The duration of the third time period can also be preset and stored inthe server, and can be, for example, 2 seconds. The fourth presetthreshold can also be preset and stored in the server. During the livestreaming, the like refers to clicking a preset mark in the livestreaming interface.

In an implementation of the present application, the server maydetermine the third time periods in the live streaming video data basedon the time stamps of the live streaming video data. The third timeperiods may be time periods of the same duration, and the time intervalbetween adjacent time periods may be equal. Furthermore, there may ormay not be an overlapped area between adjacent third time periods.

For example, the live streaming video data is video data with a lengthof 30 minutes, a period from the 0th second to 2nd second is a first oneof the third time periods, t1, a period from the 1st second to 3rdsecond is a second one of the third time periods, t2, a period from the2nd second to 4th second is a third one of the third time periods, t3,and so on, and thus multiple third time periods are selected. The starttime point and the end time point of each third time period aredetermined, and based on the start time point and end time point, thenumber of like requests received, that is, the amount of likeinformation received in the time interval between the start time pointand the end time point is determined. The magnitude relationship betweenthe number of the received like requests and the fourth preset thresholdis determined. If the number of the received like requests is greaterthan the fourth preset threshold, the middle time point of the thirdtime period can be determined as the second time point in the livestreaming video data, or the end time point of the third time period canbe determined as the second time point in the live streaming video data.

Since the more like information received reflects that the content ofthe live streaming is more wonderful, the amount of like information canbe used for determining the wonderful content.

In addition, the interaction information in the above first to thirdmanners can be used in combination to determine the second time point,and the corresponding processing can include the following.

In an implementation of the present application, the amounts of the giftresources, the comment information, and the like information eachcorrespond to a certain weight, which is respectively A, B, and C. For afourth time period, the amount of the gift resources determined by theserver is x, the amount of the comment information determined by theserver is y, and the amount of the like information determined by theserver is z, and then they are weighted to obtain a weighted value:A*x+B*y+C*z. The magnitude relationship between the weighted value and apreset value is determined, and if the weighted value is greater thanthe preset value, the middle time point of the fourth time period isdetermined as the second type time point in the live streaming videodata. In this way, the second time point is determined in comprehensiveconsideration of the three types of interaction information, which ismore accurate.

The fourth time periods may be time periods of the same duration, andthe time interval between adjacent time periods may be equal.Furthermore, there may or may not be an overlapped area between adjacentfourth time periods.

In addition, it is also possible to select two kinds of interactioninformation from the above first method to the third method for theweighted calculation to determine the second time point, which iscarried out in the same manner as that for the case of using theinteraction information of the three manners, and thus will not berepeated here.

It should be noted that the duration of the first time period, thesecond time period, the third time period and the fourth time period canbe the same. In order to make the determined position of the wonderfulcontent accurate, the durations of the first time period, the secondtime period, the third time period and the fourth time period aregenerally short, and can be less than 5 seconds, for example.

In addition, in order to ensure that the determined target video clipsdo not have duplicate content, the following processing may be performedafter 102 and before 103.

In the target time point pairs, if a first start time point is earlierthan a second start time point, an end time point corresponding to thefirst start time point is earlier than an end time corresponding to thesecond start time point, and the second start time point is earlier thanthe end time point corresponding to the first start time point, the endtime point corresponding to the first start time point is replaced withthe end time point corresponding to the second start time point and thesecond start time point and the end time point corresponding to thesecond start time point are deleted in the target time point pairs.

The first start time point is different from the second start timepoint, the first start time point is any start time point other than thesecond start time point in the target time point pair, and the secondstart time point is any start time point other than the first start timepoint in the target time point pair.

That is, the first start time point and the second start time point arestart time points included in different time point pairs of the targettime point pairs.

In an implementation of the present application, after determining thetarget time point pair, the server can determine whether there are thestart time point and the end time point which have an overlapped timerange with each other. If so, that is, there are the first start timepoint and the second start time point which satisfy that the first starttime point is earlier than the second start time point, the end timepoint corresponding to the first start time point is earlier than theend time point corresponding to the second start time point, and thesecond start time point is earlier than the end time point correspondingto the first start time point, in the target time point pairs, the endtime point corresponding to the first start time point can be replacedwith the end time point corresponding to the second start time point,and the second start time point and the end time point corresponding tothe second start time point can be deleted. As such, the first starttime point and the end time point corresponding to the first start timepoint, and the second start time point and the end time pointcorresponding to the second start time point become the first start timepoint and the end time point corresponding to the second start timepoint, that is, the end time point corresponding to the first start timepoint is replaced with the end time point corresponding to the secondstart time point. In this way, when the video clips are subsequentlyobtained, the video clips with duplicate content will be merged into onevideo clip.

For example, the first start time point is at the 23th second of the10th minute (10′23″), the end time point corresponding to the firststart time point is at the 33th second of the 10th minute (10′33″), thesecond start time point is at the 25th second of the 10th minute(10′25″), and the end time point corresponding to the second start timepoint is at the 35th second of the 10th minute (10′35″). Finally, thefirst start time point is at the 23th second of the 10th minute(10′23″), and the end time point corresponding to the first start timepoint is at the 35th second of the 10th minute (10′35″).

In an implementation of the present application, in order to ensure thatthe determined target video clips do not have duplicate content, thefollowing processing may also be performed after the 103.

If the start time point of a first video clip in target video clips isearlier than the start time point of a second video clip in the targetvideo clips, the end time point of the first video clip is earlier thanthe end time point of the second video clip, and the start time point ofthe second video clip is earlier than the end time point of the firstvideo clip, the first video clip and the second video clip are merged.

The first video clip is any video clip other than the second video clipin the target video clips, and the second video clip is any video clipother than the first video clip in the target video clips.

That is, the first video clip and the second video clip are differentvideo clips in the target video clips.

In an implementation of the present application, after determining thetarget video clips, the server can determine whether any two of thevideo clips have an overlapped part. If so, that is, there are the firstvideo clip and the second video clip which satisfy that: the start timeof the first video clip is earlier than the start time of the secondvideo clip, the end time of the first video clip is earlier than the endtime point of the second video clip, and the start time of the secondvideo clip is earlier than the end time of the first video clip, theserver can merge the first video clip and the second video clip, so thatthe video clips with duplicate content are merged into one video clip.

For example, the first video clip is a video clip from the 30th secondof the 10th minute (10′30″) to the 40th second of the 10th minute(10′40″), the second video clip is a video clip from the 35th second ofthe 10th minute (10′35″) to the 45th second of the 10th minute (10′45″),then the merged video clip is the video clip from the 30th second of the10th minute (10′30″) to the 45th second of the 10th minute (10′45″).

In an implementation of the present application, in order to make thetarget video clips more likely to include the wonderful content, thetarget video clips can be screened based on the interaction information,and the following processing may be performed after the 103.

If the amount of gift resources in the target video clip exceeds a fifthpreset threshold, the target video clip is retained, if the amount ofcomment information of the target video clip exceeds a sixth presetthreshold, the target video clip is retained, or if the amount of likeinformation of the target video clip exceeds a seventh preset threshold,the target video clip is retained.

The fifth preset threshold, the sixth preset threshold, and the seventhpreset threshold can all be preset and stored in the server.

In an implementation of the present application, after obtaining thetarget video clip, the server can determine the amount of gift resourcesin the target video clip, and the method for determining the amount ofthe gift resources in the target video clip is the same as that fordetermining the amount of gift resources in the first time period, whichwill not be repeated here. It is determined whether the amount of thegift resources exceeds the fifth preset threshold, and if so, the targetvideo clip is retained, otherwise, it indicates that the target videoclip may not contain wonderful content and thus can be deleted.

Alternatively, after obtaining the target video clip, the server candetermine the amount of the comment information of the target videoclip, and the method for determining the amount of the commentinformation of the target video clip is the same as that for determiningthe amount of the comment information in the first time period, whichwill not be repeated here. It is determined whether the amount of thecomment information exceeds the sixth preset threshold, and if so, thetarget video clip is retained, otherwise, it indicates that the targetvideo clip may not contain wonderful content and thus can be deleted.

Alternatively, after obtaining the target video clip, the server candetermine the amount of the like information of the target video clip,and the method for determining the amount of the like information of thetarget video clip is the same as that for determining the amount of thelike information in the first time period, which will not be repeatedhere. It is determined whether the amount of the like informationexceeds the seventh preset threshold, and if so, the target video clipis retained, otherwise, it indicates that the target video clip may notcontain wonderful content and thus can be deleted.

In this way, the obtained video clips can be further screened by theinteraction information, so that the probability for the interceptedvideo clips to include the wonderful content can be increased.

In an implementation of the present application, the number of thetarget video clips determined in the 103 may be relatively large. Inresponse to determining that the number of the target video clipsexceeds a preset number, the following filtering processing may beperformed, and the corresponding processing may include the following.

The determined target video clips are sorted in a descending order ofthe amount of the gift resources, the preset number of top target videosclips are obtained and determined as the final video clips.Alternatively, the determined target video clips are sorted in thedescending order of the amount of the comment information, and thepreset number of top target video clips are obtained and determined asthe final video clips. Alternatively, the determined target video clipsare sorted in the descending order of the amount of the likeinformation, and the preset number of top target video clips areobtained and determined as the final video clips.

The preset number may be a number which is set in advance and is usedfor indicating the number of video clips that are finally fed back tothe terminal.

In an implementation of the present application, after obtaining thetarget video clip, the server can determine the amount of the giftresources in the target video clip, and the method for determining theamount of the gift resources in the target video clip is the same asthat for determining the amount of the gift resources in the first timeperiod, which will not be repeated here. The target video clips aresorted in the descending order of the amount of the gift resources, thepreset number of top target videos clips are obtained and determined asthe final video clips.

Alternatively, after obtaining the target video clip, the server candetermine the amount of comment information of the target video clip,and the method for determining the amount of the comment information ofthe target video clip is the same as that for determining the amount ofthe comment information in the first time period, which will not berepeated here. The target video clips are sorted in the descending orderof the amount of the comment information, and the preset number of toptarget video clips are obtained and determined as the final video clips.

Alternatively, after obtaining the target video clip, the server candetermine the amount of the like information of the target video clip,and the method for determining the amount of the like information of thetarget video clip is the same as that for determining the amount of thelike information in the first time period, which will not be repeatedhere. The determined target video clips are sorted in the descendingorder of the amount of the like information, and the preset number oftop target video clips are obtained and determined as the final videoclips.

In addition, in this process, the various interaction information canalso be combined and weighted. For example, after the amount of the likeinformation, the amount of the comment information, and the amount ofthe gift resources are weighted, the target video clips are sorted inthe descending order of the weighted values, and the preset number oftop target video clips are obtained and determined as the video clips ofthe terminal.

It should be noted that determining the amount of the gift resources,the comment information, and the like information of the video clip canbe understood as determining the amount of the gift resources, thecomment information, and the like information during a live streamingtime period corresponding to the video clip.

In the implementation of the present application, in response toobtaining the video clip from the live streaming video data of theperformance live streaming room, the target time point pair of the livestreaming video data are determined by using the audio data of the livestreaming video data and the audio data of the original performer; andthe target video clip is obtained based on the start time point and theend time point in the target time point pair. Since the server directlyperforms the video interception based on the audio data of the livestreaming video data and the audio data of the original performer toobtain the video clip without manually operating the recording button,there will be no time interval between the start of the wonderfulcontent and the start of recording the video data displayed on thescreen, and thus the intercepted video clips are relatively complete.

FIG. 4 is a block diagram showing a device for obtaining a video clipbased on an example implementation. Referring to FIG. 4, the deviceincludes an obtaining unit 411 and a determining unit 412.

The obtaining unit 411 is configured to obtain live streaming video datain a performance live streaming room.

The determining unit 412 is configured to determine a target time pointpair of the live streaming video data based on audio data of the livestreaming video data and audio data of an original performer, where thetarget time point pair includes a start time point and an end timepoint.

The obtaining unit 411 is further configured to obtain a target videoclip from the live streaming video data based on the target time pointpair.

Optionally, the determining unit 412 is configured to:

-   determine a first time point in the live streaming video data based    on the audio data of the live streaming video data and the audio    data of the original performer; and-   determine the target time point pair corresponding to the first time    point centered at the first time point based on a preset    interception time duration.

Optionally, the audio data of the live streaming video data is the audiodata of a song sung by a host, and the audio data of the originalperformer is the audio data of the song sung by the original singer. Thedetermining unit 412 is configured to:

-   perform voice recognition on the audio data of the live streaming    video data to obtain lyrics of the song; obtain the audio data of    the song sung by the original singer based on the lyrics; for each    sentence of the lyrics, determine a similarity between audio    features of the audio data of the song sung by the original singer    and audio features of the audio data of the live streaming video    data, as a lyric similarity; and determine a time point of the audio    data of the live streaming video data corresponding to a position in    the lyrics with a highest lyric similarity above a first preset    threshold, as the first time point of the live streaming video data.

Optionally, the determining unit 412 is further configured to determinea second time point in the live streaming video data based oninteraction information of accounts other than a host account of thelive streaming video data.

The determining unit 412 is configured to:

-   if a target time point in the first time points belongs to the    second time point, retain the target time point, and if the target    time point in the first time points does not belong to the second    time point, delete the target time point; and-   determine the target time point pair corresponding to the retained    first time point based on the preset interception time duration by    taking the retained first time point as a center.

Optionally, the determining unit 412 is further configured to:

-   if the amount of gift resources of the live streaming video data in    a first time period exceeds a second preset threshold, determine a    middle time point or an end time point of the first time period as    the second time point in the live streaming video data; if the    amount of comment information of the live streaming video data in a    second time period exceeds a third preset threshold, determine the    middle time point or the end time point of the second time period as    the second time point in the live streaming video data; and/or;-   if the number of likes of the live streaming video data in a third    time period exceeds a fourth preset threshold, determine the middle    time point or the end time point of the third time period as the    second time point in the live streaming video data.

Optionally, the determining unit 412 is further configured to:

-   perform gift image recognition on images in the live streaming video    data for the first time period to obtain the number of recognized    gift images; and-   determine the amount of the gift resources in the first time period    based on the number of the gift images.

Optionally, the determining unit 412 is further configured to:

-   if in the target time point pairs, a first start time point is    earlier than a second start time point, an end time point    corresponding to the first start time point is earlier than an end    time corresponding to the second start time point, and the second    start time point is earlier than the end time point corresponding to    the first start time point, replace the end time point corresponding    to the first start time point with the end time point corresponding    to the second start time point, and delete the second start time    point and the end time point corresponding to the second start time    point in the target time point pairs. The first start time point and    the second start time point are start time points included in    different time point pairs in the target time point pairs.

Optionally, the determining unit 412 is further configured to generatelink information of the target video clip.

As shown in FIG. 5, the device further includes:

-   a sending unit 413 configured to send the link information to login    terminals of other accounts than the host account in the performance    live streaming room to enable the login terminals of the other    accounts to display the link information on a playback interface of    the performance live streaming room, or to display the link    information on a live streaming end interface of the performance    live streaming room.

Optionally, the obtaining unit 411 is further configured to:

-   if the amount of the gift resources of the target video clip exceeds    a fifth preset threshold, retain the target video clip;-   if the amount of the comment information of the target video clip    exceeds a sixth preset threshold, retain the target video clip; or,    if the amount of the like information of the target video clip    exceeds a seventh preset threshold, retain the target video clip.

In the implementation of the present application, in response toobtaining the video clips from the live streaming video data of theperformance live streaming room, the target time point pairs of the livestreaming video data are determined by using the audio data of the livestreaming video data and the audio data of the original performer; andthe target video clips are obtained based on the start time points andthe end time points in the target time point pairs. Since the serverdirectly performs the video interception based on the audio data of thelive streaming video data and the audio data of the original performerto obtain the video clips without manually operating the recordingbutton, there will be no time interval between the start of thewonderful content and the start of recording the video data displayed onthe screen, and thus the intercepted video clips are relativelycomplete.

Regarding the device in the foregoing implementations, specific mannersfor the units to perform operations have been described in detail in theimplementations of the related methods, and will not be detailed here.

FIG. 6 is a schematic structural diagram of a server provided by animplementation of the present application. The server 600 may haverelatively large changes due to different configurations or performance,and may include one or more central processing units (CPUs) 601 and oneor more memories 602 which store at least one piece of instructions, andthe at least one piece of instructions is loaded and executed by theprocessor 601 to carry out the operations of the method for obtainingthe video clip.

In an implementation of the present application, there is providedanother server, including: a processor and a memory for storinginstructions executable by the processor, and the processor isconfigured to perform the operations of the method for obtaining thevideo clip.

FIG. 7 is a block diagram showing a server 700 according to an exampleimplementation. Referring to FIG. 7, the server 700 includes aprocessing component 722, which further includes one or more processors,and memory resources represented by a memory 732 for storinginstructions that can be executed by the processing component 722, suchas an application program. The application program stored in the memory732 may include one or more modules each of which corresponds to a setof instructions. In addition, the processing component 722 is configuredto execute the instructions to perform the operations of the method forobtaining the video clip.

The server 700 may also include a power supply component 726 configuredto perform power management of the server 700, a wired or wirelessnetwork interface 750 configured to connect the server 700 to a network,and an input/output (I/O) interface 758. The server 700 can operatebased on an operating system stored in the memory 732, such as WindowsServer™, Mac OS X™, Unix™, Linux™, FreeBSD™ or similar operatingsystems.

In an implementation of the present application, there is also provideda device for obtaining a video clip, including: a processor and a memoryfor storing instructions executable by the processor, and the processoris configured to perform the operations of the method for obtaining thevideo clip.

In an implementation of the present application, there is provided anon-transitory computer-readable storage medium having stored thereininstructions which, when being executed by a processor of a server,cause the server to perform the operations of the method for obtainingthe video clip.

In the implementations of the present application, there is alsoprovided an application program, including one or more instructionswhich can be executed by a processor of a server to carry out theoperations of the method for obtaining the video clip.

Other implementations of the present disclosure will be apparent tothose skilled in the art in consideration of the specification andpractice of the present disclosure disclosed herein. The presentapplication is intended to cover any variations, uses, or adaptations ofthe present disclosure, which follow the general principles of thepresent disclosure and include common general knowledge or conventionaltechnical means in the art that are not disclosed in the presentdisclosure. The specification and implementations are merelyillustrative, and a real scope and spirit of the present disclosure isdefined by the appended claims.

It should be understood that the present disclosure is not limited tothe precise structures described above and shown in the drawings, andvarious modifications and changes can be made without departing from thescope thereof. The scope of the present disclosure is limited only bythe appended claims.

The above are only the preferred implementations of the presentapplication and are not intended to limit the present application. Anymodification, equivalent replacement, improvement, etc. made within thespirit and principle of the present application shall fall within theprotection scope of the present application.

What is claimed is:
 1. A method for obtaining a video clip, comprising:obtaining live streaming video data of a performance live streamingroom; determining target time point pairs of the live streaming videodata based on audio data of the live streaming video data and audio dataof an original performer, wherein each of the target time point pairscomprises a start time point and an end time point; obtaining acandidate target video clip from the live streaming video data based onthe target time point pairs; and determining the candidate target videoclip as a target video clip in response to at least one of: an amount ofgift resources of the candidate target video clip exceeding a giftresource threshold, or in response to an amount of comment informationof the candidate target video clip exceeding a comment informationthreshold, or in response to an amount of like information of thecandidate target video clip exceeding a like information threshold. 2.The method according to claim 1, wherein said determining the targettime point pairs of the live streaming video data based on the audiodata of the live streaming video data and the audio data of the originalperformer comprises: determining first time points of the live streamingvideo data based on the audio data of the live streaming video data andthe audio data of the original performer; and determining the targettime point pairs corresponding to the first time points centered at thefirst time points based on a preset interception time duration.
 3. Themethod according to claim 2, wherein the audio data of the livestreaming video data is audio data of a song sung by a host, and theaudio data of the original performer is audio data of the song sung bythe original singer; said determining the first time points of the livestreaming video data based on the audio data of the live streaming videodata and the audio data of the original performer comprises: obtaininglyrics of the song by performing voice recognition on the audio data ofthe live streaming video data; obtaining the audio data of the song sungby the original singer based on the lyrics; determining a lyricsimilarity between audio features of the audio data of the song sung bythe original singer and audio features of the audio data of the livestreaming video data for each sentence of the lyrics; and determining atime point corresponding to a position in the lyrics with a highestlyric similarity above a lyric information threshold, as the first timepoint of the live streaming video data.
 4. The method according to claim2, further comprising: determining second time points of the livestreaming video data based on interaction information of accounts otherthan a host account of the live streaming video data; wherein saiddetermining the target time point pairs corresponding to the first timepoints centered at the first time points based on the presetinterception time duration comprises: determining a time point of thefirst time points as a target time point in response to determinationthat the time point belongs to the second time points, and deleting thetime point in response to determination that the time point does notbelong to the second time points; and determining the target time pointpair corresponding to the target time point centered at the target timepoint based on the preset interception time duration.
 5. The methodaccording to claim 4, wherein said determining the second time points inthe live streaming video data based on interaction information ofaccounts other than a host account of the live streaming video datacomprises: determining a middle time point or an end time point of afirst time period as the second time point of the live streaming videodata in response to an amount of gift resources of the live streamingvideo data in the first time period exceeding another gift resourcethreshold; determining a middle time point or an end time point of asecond time period as the second time point of the live streaming videodata in response to an amount of comment information of the livestreaming video data in the second time period exceeding another commentinformation threshold; or, determining a middle time point or an endtime point of a third time period as the second time point of the livestreaming video data in response to a number of likes of the livestreaming video data in the third time period exceeding another likeinformation threshold.
 6. The method according to claim 5, furthercomprising: obtaining a number of each type of recognized gift images byrecognizing gift images in the live streaming video data for the firsttime period; and determining the amount of the gift resources in thefirst time period based on the number of each type of gift images. 7.The method according to claim 1, further comprising: replacing a firstend time point with a second end time point and deleting a second starttime point and the second end time point in the target time point pairsin response to a first start time point being earlier than the secondstart time point, the first end time point being earlier than the secondend time point, and the second start time point being earlier than thefirst end time point, wherein the first end time point corresponds tothe first start time point, the second end time point corresponds to thesecond start time point, the first start time point and the second starttime point are different and are comprised in the target time pointpairs.
 8. The method according to claim 1, further comprising:generating link information of the target video clip; and sending thelink information to login terminals of other accounts than a hostaccount in the performance live streaming room for displaying the linkinformation on a playback interface or a live streaming end interface ofthe performance live streaming room of the login terminals of otheraccounts.
 9. A device for obtaining a video clip, comprising: aprocessor; and a memory for storing instructions executable by theprocessor; wherein, the processor is configured to perform operationscomprising: obtaining live streaming video data of a performance livestreaming room; determining target time point pairs of the livestreaming video data based on audio data of the live streaming videodata and audio data of an original performer, wherein each of the targettime point pairs comprises a start time point and an end time point;obtaining a candidate target video clip from the live streaming videodata based on the target time point pairs; and determining the candidatetarget video clip as a target video clip in response to an amount ofgift resources of the candidate target video clip exceeding a giftresource threshold, or in response to an amount of comment informationof the candidate target video clip exceeding a comment informationthreshold, or in response to an amount of like information of thecandidate target video clip exceeding a like information threshold. 10.The device according to claim 9, wherein said determining the targettime point pairs of the live streaming video data based on the audiodata of the live streaming video data and the audio data of the originalperformer comprises: determining first time points of the live streamingvideo data based on the audio data of the live streaming video data andthe audio data of the original performer; and determining the targettime point pairs corresponding to the first time points centered at thefirst time points based on a preset interception time duration.
 11. Thedevice according to claim 10, wherein the audio data of the livestreaming video data is audio data of a song sung by a host, and theaudio data of the original performer is audio data of the song sung bythe original singer; said determining the first time points of the livestreaming video data based on the audio data of the live streaming videodata and the audio data of the original performer comprises: obtaininglyrics of the song by performing voice recognition on the audio data ofthe live streaming video data; obtaining the audio data of the song sungby the original singer based on the lyrics; determining a lyricsimilarity between audio features of the audio data of the song sung bythe original singer and audio features of the audio data of the livestreaming video data for each sentence of the lyrics; and determining atime point data corresponding to a position in the lyrics with a highestlyric similarity above a lyric information threshold, as the first timepoint of the live streaming video data.
 12. The device according toclaim 10, wherein the operations further comprise: determining secondtime points of the live streaming video data based on interactioninformation of accounts other than a host account of the live streamingvideo data; wherein said determining the target time point pairscorresponding to the first time points centered at the first time pointsbased on the preset interception time duration comprises: determining atime point of the first time points as a target time point in responseto determination that the time point belongs to the second time points,and deleting the time point in response to determination that the timepoint does not belong to the second time points; and determining thetarget time point pair corresponding to the target time point centeredat the target time point based on the preset interception time duration.13. The device according to claim 12, wherein said determining thesecond time points in the live streaming video data based on interactioninformation of accounts other than a host account of the live streamingvideo data comprises: determining a middle time point or an end timepoint of a first time period as the second time point of the livestreaming video data in response to an amount of gift resources of thelive streaming video data in the first time period exceeding anothergift resource threshold; determining a middle time point or an end timepoint of a second time period as the second time point of the livestreaming video data in response to an amount of comment information ofthe live streaming video data in the second time period exceedinganother comment information third threshold; or, determining a middletime point or an end time point of a third time period as the secondtime point of the live streaming video data in response to a number oflikes of the live streaming video data in the third time periodexceeding another like information threshold.
 14. The device accordingto claim 13, wherein the operations further comprise: obtaining a numberof each type of recognized gift images by recognizing gift images in thelive streaming video data for the first time period; and determining theamount of the gift resources in the first time period based on thenumber of each type of gift images.
 15. The device according to claim 9,wherein the operations further comprise: replacing a first end timepoint with a second end time point and deleting a second start timepoint and the second end time point in the target time point pairs inresponse to a first start time point being earlier than the second starttime point, the first end time point being earlier than the second endtime point, and the second start time point being earlier than the firstend time point, wherein the first end time point corresponds to thefirst start time point, the second end time point corresponds to thesecond start time point, the first start time point and the second starttime point are different and are comprised in the target time pointpairs.
 16. The device according to claim 9, wherein the operationsfurther comprise: generating link information of the target video clip;and sending the link information to login terminals of other accountsthan a host account in the performance live streaming room fordisplaying the link information on a playback interface or a livestreaming end interface of the performance live streaming room of thelogin terminals of other accounts.
 17. A non-transitorycomputer-readable storage medium having stored thereon instructionswhich, when being executed by a processor of a server, cause the serverto perform operations comprising: obtaining live streaming video data ofa performance live streaming room; determining target time point pairsof the live streaming video data based on audio data of the livestreaming video data and audio data of an original performer, whereineach of the target time point pairs comprises a start time point and anend time point; obtaining a candidate target video clip from the livestreaming video data based on the target time point pairs; anddetermining the candidate target video clip as a target video clip inresponse to an amount of gift resources of the candidate target videoclip exceeding a gift resource threshold, or in response to an amount ofcomment information of the candidate target video clip exceeding acomment information threshold, or in response to an amount of likeinformation of the candidate target video clip exceeding a likeinformation threshold.